This page is designed to get an understanding of the characteristics (demographic and educational) of the 634 students who were included in this dataset.

sleep_df <- read_csv("data/cmu-sleep.csv") |> 
  janitor::clean_names() |> 
  mutate(demo_race = case_when(demo_race == 0 ~ "Underrepresented", 
                                     demo_race == 1 ~ "Non-underpresented"),
         demo_race = fct_relevel(demo_race, "Underrepresented"),
         demo_gender = case_when(demo_gender == 0 ~ "Male", 
                                     demo_gender == 1 ~ "Female"),
         demo_gender = fct_relevel(demo_gender, "Male"), 
         demo_firstgen = case_when(demo_firstgen == "0" ~ "Non-first gen", 
                                   demo_firstgen == "1" ~ "First-gen", 
                                   TRUE ~ NA),
         demo_firstgen = fct_relevel(demo_firstgen, "Non-first gen"), 
         time_collection = case_when(cohort == "lac1" ~ "Spring, 2018", 
                            cohort == "lac2" ~ "Spring, 2017",
                            cohort == "nh" ~ "Spring, 2016",
                            cohort == "uw1" ~ "Spring, 2018", 
                            cohort == "uw2" ~ "Spring, 2019"), 
         cohort = case_when(cohort == "lac1" ~ "CMU", 
                            cohort == "lac2" ~ "CMU",
                            cohort == "nh" ~ "NDU",
                            cohort == "uw1" ~ "UW", 
                            cohort == "uw2" ~ "UW"))

Demographic Distribution of Observations

sleep_df <- sleep_df |> 
  rename(`Cohort` =`cohort`, 
         `Race` =`demo_race`, 
         `Gender` =`demo_gender`, 
         `First-Generation` =`demo_firstgen`, 
         `Relative Course Load`= `zterm_units_zof_z`, 
         `End-of-term GPA` =  `term_gpa`, 
         `Cumulative GPA` = `cum_gpa`)

#write_csv(sleep_df, "data/cleaned_cmu_sleep.csv")

summary_tbl <- sleep_df |> 
  select(`Cohort`, `Race`, `Gender`, `First-Generation`, time_collection,
         `Relative Course Load`, `End-of-term GPA`, `Cumulative GPA`) |> 
  tbl_summary(
    by = `Cohort`,
    statistic = list(
      all_continuous() ~ "{median} ({p25}, {p75})",
      all_categorical() ~ "{n} / {N} ({p}%)"), 
    label = list(
      time_collection ~ "Time")) |> 
  add_overall() |> 
  modify_spanning_header(c("stat_1", "stat_2", "stat_3") ~ "**Cohort**") |> 
  bold_labels() |>
  italicize_labels() |> 
  add_p()

as_kable_extra(summary_tbl)
<<<<<<< HEAD ======= >>>>>>> 3bd37faf13bcac586ec6407b1cd7ffca3da7ae29
Cohort
Characteristic Overall
N = 634
CMU
N = 208
NDU
N = 147
UW
N = 279
p-value
Race
Underrepresented 119 / 633 (19%) 31 / 208 (15%) 30 / 146 (21%) 58 / 279 (21%)
Non-underpresented 514 / 633 (81%) 177 / 208 (85%) 116 / 146 (79%) 221 / 279 (79%)
Unknown 1 0 1 0
Gender
Male 263 / 631 (42%) 82 / 206 (40%) 76 / 147 (52%) 105 / 278 (38%)
Female 368 / 631 (58%) 124 / 206 (60%) 71 / 147 (48%) 173 / 278 (62%)
Unknown 3 2 0 1
First-Generation
Non-first gen 526 / 629 (84%) 190 / 208 (91%) 138 / 146 (95%) 198 / 275 (72%)
First-gen 103 / 629 (16%) 18 / 208 (8.7%) 8 / 146 (5.5%) 77 / 275 (28%)
Unknown 5 0 1 4
Time
Spring, 2016 147 / 634 (23%) 0 / 208 (0%) 147 / 147 (100%) 0 / 279 (0%)
Spring, 2017 77 / 634 (12%) 77 / 208 (37%) 0 / 147 (0%) 0 / 279 (0%)
Spring, 2018 271 / 634 (43%) 131 / 208 (63%) 0 / 147 (0%) 140 / 279 (50%)
Spring, 2019 139 / 634 (22%) 0 / 208 (0%) 0 / 147 (0%) 139 / 279 (50%)
Relative Course Load 0.04 (-0.60, 0.56) -0.13 (-0.75, 0.56) NA (NA, NA) 0.04 (-0.50, 0.37) 0.4
Unknown 147 0 147 0
End-of-term GPA 3.56 (3.23, 3.81) 3.49 (3.06, 3.78) 3.71 (3.56, 3.89) 3.50 (3.17, 3.79) <0.001
Cumulative GPA 3.56 (3.23, 3.79) 3.51 (3.11, 3.79) 3.71 (3.44, 3.83) 3.49 (3.17, 3.73) <0.001
1 n / N (%); Median (Q1, Q3)
2 Pearson’s Chi-squared test; Kruskal-Wallis rank sum test


The dataset consists of first-year students from three different kind universities: Carnegie Mellon (CMU) is a STEM-focused private university, University of Washington (UW) is a large public university and Notre Dame (NDU) is a private catholic university. Each cohort have students roughly 15-20% students with underrepresented racial identity. However, the male to female ratio, and proportion of first generation students vary between the cohorts.

Relative Course Load

plot_function <- function(demo) {
  plot <- sleep_df |> 
    drop_na(rlang::sym(demo)) |> 
    pivot_longer(
      cols = c(`Relative Course Load`, `End-of-term GPA`, `Cumulative GPA`), 
      names_to = "academic_info", 
      values_to = "unit"
    ) |> 
    mutate(academic_info = fct_relevel(academic_info, "Relative Course Load", "End-of-term GPA", "Cumulative GPA")) |> 
    ggplot(aes(x = get(demo), y = unit, fill = academic_info)) +
    geom_violin(alpha = 0.2) + 
    labs(y = "", x = "", fill = "") +
    scale_y_continuous(expand = c(0,0)) +
    facet_wrap(~academic_info, ncol = 3, scales = "free") +
    theme(panel.spacing = unit(1, "lines"), 
          axis.text.x = element_text(angle = 30))
  ggplotly(plot) |> 
    layout(autosize = F, width = 825, height = 350)

}

Race

demo_name = "Race"
plot_function(demo = demo_name)
<<<<<<< HEAD
=======
>>>>>>> 3bd37faf13bcac586ec6407b1cd7ffca3da7ae29

Gender

demo_name = "Gender"
plot_function(demo = demo_name)
<<<<<<< HEAD
=======
>>>>>>> 3bd37faf13bcac586ec6407b1cd7ffca3da7ae29

First-Generation

demo_name = "First-Generation"
plot_function(demo = demo_name)
<<<<<<< HEAD
=======
>>>>>>> 3bd37faf13bcac586ec6407b1cd7ffca3da7ae29

Cohort

demo_name = "Cohort"
plot_function(demo = demo_name)
<<<<<<< HEAD
=======

NDU does not have information related to term unit load.

Time of Data Collection

demo_name = "time_collection"
plot_function(demo = demo_name)
>>>>>>> 3bd37faf13bcac586ec6407b1cd7ffca3da7ae29